34 - Deep Learning - Plain Version 2020 [ID:21168]
50 von 103 angezeigt

Welcome back to deep learning and today we want to talk a bit about gated recurrent units,

a simplification of the LSTM cell.

So again a neural network, gated recurrent units.

So the idea here is that the LSTM is of course great, but it has a lot of parameters and

it's kind of difficult to train.

So Cho came up with the gated recurrent unit and it was introduced in 2014 for statistical

machine translation.

You could argue it's a variant of the LSTM, but it's simpler and it has fewer parameters.

So this is the general setup.

You can see we don't have two different memories like in the LSTM.

We only have one hidden state, but one similarity to the LSTM, the hidden state flows only along

a linear change.

So you only see multiplications and additions here.

And again, as in the Elman cell, we produce from the hidden state the outputs.

So let's have a look into the ideas that Cho had in order to propose this group cell.

Well, it takes the concepts from the LSTMs and it controls the memory by gates.

The main difference is there's no additional cell state.

So the memory only operates directly via the hidden state and the update of the state can

be divided into four steps.

There's a reset gate that is controlling the influence of the previous hidden state.

And there is an update gate that introduces newly computed updates.

So the next step proposes an updated hidden state, which is then used to update the hidden

state.

So how does this work?

Well, first we determine the influence of the previous hidden state.

And this is done by a sigmoid activation function.

Here we again have a matrix.

We concatenate input and previous hidden state, multiply it with a matrix and add some bias.

And feed it to the sigmoid activation function, which produces some reset value RT.

Next we produce some ZT.

And this is essentially the update proposal on the new hidden state.

So this is again produced by a sigmoid function where we concatenate the last hidden state

and the input vector, multiply it with a matrix WZ and add some bias.

Next we propose the update.

So we combine the input and the reset state.

And this is done in the following manner.

So the update proposal H tilde is produced by a tangents hyperbolicus where we take the

reset gate times the last hidden state.

So we essentially remove entries that we don't want to see from the last hidden state and

concatenate XT, multiply with some matrix WH and add some bias BH.

This is then fed to a tangents hyperbolicus to produce an update proposal.

Now with the update proposal we go to the update gate.

And the update gate controls the combination of the old state and the proposed state.

So we compute the new state by multiplying 1 minus ZT, you remember this is the intermediate

variable that we computed earlier, with the old state plus ZT times, and again this is

a pointwise multiplication, times the proposed update.

So essentially the sigmoid function that produced the ZT is now used to select whether to keep

the old information from the old state or to update it with information from the new

state.

This then gives the new hidden state and with the new hidden state we produce the new output

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:09:06 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 19:16:26

Sprache

en-US

Deep Learning - Recurrent Neural Networks Part 4

This video discusses Gated Recurrent Units and compares them to Elman and LSTM Cells.

For reminders to watch the new video follow on Twitter or LinkedIn.

Further Reading:
A gentle Introduction to Deep Learning

Einbetten
Wordpress FAU Plugin
iFrame
Teilen